This invention relates to a document processing method and apparatus for processing a document having an internal structure in connection with elements and to a recording medium having recorded thereon a program adapted for processing the document. More particularly, it relates to a document processing method and apparatus in which each document is classified based on the degree of interrelation of semantics contained in the document and a recording medium having recorded thereon a document processing program for classifying the document based on the degree of interrelation of semantics contained in the document.
Up to now, WWW (World Wide Web) is used as an application service furnishing the hyper text type information in a window style. This WWW is the system for executing document processing such as creation, publication or co-owning of documents for suggesting the possibility of a new style document. From the standpoint of practical utilization of documents, a demand is raised for advanced document processing which surpasses WWW, such as document classification or summarizing based on the document contents. For such advanced document processing, mechanical processing of document contents is indispensable.
The mechanical processing of document contents still continues to be difficult for the following reason: First, the HTML (Hyper Text Markup Language), as a language stating the hyper text, prescribing the expression of sentences, hardly prescribe the document contents. Second, the hyper text network, constructed between sentences, are not convenient for the reader of the document to exploit for understanding the document contents. Third, a writer of sentences usually writes without taking account of the convenience for the user, whilst the convenience for the user is not adjusted for the convenience of the writer.
Whilst the WWW is a system suggesting the possibility of a new style document, it is not able to realize advanced document processing since it does not process the document mechanically. If, in WWW, advanced document processing is to be executed, the document needs to be processed mechanically.
For enabling mechanical document processing, a system for supporting mechanical document processing has been developed based on achievement in the field of researches in natural languages. As a first step for document processing by researches in the natural languages, there has been proposed mechanical document processing exploiting tags presupposed to be imparted to the document as the attribute information on the internal structure of a document created by the writer of the document.
In keeping pace with the progress in computers and in network, a demand is raised for improving functions in document processing in creation, labelling or changing a text document such as by sentence processing or indexing dependent on the sentence contents. For realizing this document processing with improved functions, it is necessary to process documents based on the degree of interrelation of respective semantics in the document.